


■1 TCGCGGGAGC CAGAGGGCCC TGCGGTCCTC GGTGGTCTTG CCAGCCCCTC 
51 CTCATCCCAG GGCCCTCCGC GCCTGTGAGG ACTCCCTCAG GTCGGCCACG 
101 GGACCTGACG CAACAGGATG GACGAGTCCC CTGAGCCTCT GCAGCAGGGC 
151 AGAGGGCCGG TGCCGGTCCG ACGGCAGCGC CCAGCACCCC GGGGTCTGCG 
201 TGAGATGCTG AAGGCCAGGC TGTGGTGCAG CTGCTCGTGC AGTGTGCTGT 
251 GCGTCCGGGC GCTGGTGCAG GACCTGCTCC CCGCCACGCG CTGGCTGCGT 
301 CAGTACCGCC CGCGGGAGTA CCTGGCAGGC GACGTCATGT CTGGGCTGGT 
351 CAT CGGCATC ATCCTGGTGC CGCAGGCCAT CGCCTACTCA TTGCTGGCCG 
4 01 GGCTGCAGCC CATCTACAGC CTCTATACGT CCTTCTTCGC CAACCTCATC 
4 51 TACTTCCTCA TGGGCACCTC ACGGCATGTC TCCGTGGGCA TCTTCAGCCT 
501 GCTTTGCCTC ATGGTGGGGC AGGTGGTGGA CCGGGAGCTC CAGCTGGCCG 
551 GCTTTGACCC CTCCCAGGAC GGCCTGCAGC CCGGAGCCAA CAGCAGCACC 
601 CTCAACGGCT CGGCTGCCAT GCTGGACTGC GGGCGTGACT GCTACGCCAT 
— 651—GGGTGTCGCC - ACCGCCCTCA - CGCTGATGAC~CGGGCTTTAC"CAGGTCCTCA 
701 TGGGCGTCCT CCGGCTGGGC TTCGTGTCCG CCTACCTCTC ACAGCCACTG 
751 CTCGATGGCT TTGCCATGGG GGCCTCCGTG ACCATCCTGA CCTCGCAGCT 
801 CAAACACCTG CTGGGCGTGC GGATCCCGCG GCACCAGGGG CCCGGCATGG 
851 TGGTCCTCAC ATGGCTGAGC CTGCTGCGCG GCGCCGGGCA GGCCAACGTG 
901 TGCGACGTGG TCACCAGCAC GGTGTGCCTG GCGGTGCTGC TAGCCGCGAA 
951 GGAGCTCTCA GACCGCTACC GACACCGCCT GAGGGTGCCG CTGCCCACGG 
1001 AGCTGCTGGT CATCGTGGTG GCCACACTCG TGTCGCACTT CGGGCAGCTC 
1051 CACAAGCGCT TTGGCTCGAG CGTGGCTGGC GACATCCCCA CGGGTTTCAT 
1101 GCCCCCTCAG GTCCCAGAGC CCAGGCTGAT GCAGCGTGTG GCTTTGGATG 
1151 CCGTGGCCCT GGCCCTCGTG GCTGCCGCCT TCTCCATCTC GCTGGCGGAG 
1201 ATGTTCGCCC GCAGTCACGG CTACTCTGTG CGTGCCAACC AGGAGCTGCT 
1251 GGCTGTGGGC TGCTGCAACG TGCTACCCGC CTTCCTCCAC TGCTTCGCCA 
1301 CCAGCGCCGC CCTGGCCAAG AGCCTGGTGA AGACAGCCAC TGGCTGCCGG 
1351 ACACAGCTGT CCAGCGTGGT CAGCGCCACC GTGGTGCTGC TGGTGCTGCT 
14 01 GGCGCTGGCA CCGCTGTTCC ACGACCTACA GCGAAGCGTG CTGGCCTGCG 
14 51 TCATCGTGGT CAGCCTGCGG GGGGCCCTGC GCAAGGTGTG GGACCTCCCG 
1501 CGGCTGTGGC GGATGAGCCC GGCTGACGCG CTGGTCTGGG CAGGCACCGC 
1551 GGCCACCTGT ATGCTGGTCA GCACAGAGGC CGGGCTGCTG GCTGGCGTCA 
1601 TCCTCTCGCT GCTCAGCCTG GCCGGCCGCA CCCAACGCCC ACGCACCGCC 
1651 CTGCTGGCCC GCATCGGGGA CACGGCCTTC TACGAGGATG CCACAGAGTT 
1701 CGAGGGCCTC GTCCCTGAGC CCGGCGTGCG GGTGTTCCGC TTTGGGGGGC 
1751 CGCTGTACTA TGCCAACAAG GACTTCTTCC TGCAGTCACT CTACAGCCTC 
1801 ACGGGGCTGG ACGCAGGGTG CATGGCTGCC AGGAGGAAGG AGGGGGGCTC 
1851 AGAGACGGGG GTCGGTGAGG GAGGCCCTGC CCAGGGCGAG GACCTGGGCC 
1901 CGGTTAGCAC CAGGGCTGCG CTGGTGCCCG CAGCGGCCGG CTTCCACACA 
1951 GTGGTCATCG ACTGCGCCCC GCTGCTGTTC CTAGACGCAG CCGGTGTGAG 
2001 CACGCTGCAG GACCTGCGCC GAGACTACGG GGCCCTGGGC ATCAGCCTGC 
2051 TGCTAGCCTG CTGCAGCCCG CCTGTGAGAG ACATTCTGAG CAGAGGAGGC 
2101 TTCCTCGGGG AGGGCCCCGG GGACACGGCT GAGGAGGAGC AGCTGTTCCT 
2151 CAGTGTGCAC GATGCCGTGC AGACAGCACG AGCCCGCCAC AGGGAGCTGG 
2201 AGGCCACGGA TGTCCATCTG TAGCAGGGCC AGGCCTGCCC AGCAGCCTCT 
2251 GCTCCCTCCT GGGGACCCAC AGCAGACGTC TGCAAGCCAC TGCTGAGACC 
2301 CTTCCCAGGG AGGAjSCCACC CAAGAGCTGC ACTCTTGTGC CACAGCTGCC 
2351 CTGGGGAAAG CGGGGAACCC CAACTGGGAA AGGAGGCCCT CTGATCACAC 
24 01 GCAGGACCCA AACACTCAGA AATCAAGAAC CTCTGCCTCC GAGACAGGCT 
2451 GGCCCACAGT GCTGGCTGGG CCCCAATGCA CCGTCCCTCA GCTCAGAAGG 
2501 GATGGGCCTG ACCTGACGCT CAGGGTTGAC ATCTTATTTG AACAAGGGTC 
2551 CCCCGCCATC ATGCAGCCTC CAAGGTGCCA AGAGGACTCC CTATGCCCAG 
2601 GCCTGCCCGG TGCCCACCCT GCTGGTAGGA GCCAGCGGCT CTGGCCAAGT 
2651 GCACGAGGGT CTCTGTGTTT CCAGAAGGCC CCACACACCC AAGTGCCCCT 
2701 CACACCTCGT GCCTCCCCCT CACAGGGTGG CCACCTGCAC CAGCGTCAGG 
2751 GCCCAGGGTG CTGTGACCGA TGAGACCTCA GCTCAGCCCT CAGGTGCAGT 
2801 GGCCCTACCC AGCCTGGCCA GCAGACACAC ACAGGGATGC TCACGGGTGC 
2851 ACCAGGAGCC AGGTGCGGCG CAGCCAACCC TGAGCCTGCA GGGAGACCTG 
2901 CAGGAAGCCC ACCGTGCCCC ATGCAGGGGC TCCCTCCAGC ACACAGCCCT 
2951 CACCCCAGCA CAGCCAGCAA GGACACGCTC TCCCCAACAG GGTGCTTCGG 
3001 CGGGAGGTGG GGGAACAAGG GGTCTTCCGA GCAGCCCCCA GCCCTCCCCT 
3051 CCCATCTGTG CCTCTGTAAG GGGCTCTGGG ACGCCCAGAC CCTGCCCGCC 
3101 GCCCACCTGG TGGTGACAAG CTCCAGCAGC CAGTGGGTCC GGACCTGCTT 
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3151 GATGGCGCGG TGAGGGACGG CGCCCACATA GGCGAGGTTG AGCTGCTGGT 

3201 CCCAGCTGAG GACGTACTGG TCAGCCTGGC TGTGTGGCAG CGGGGGGCTG 

3251 GGGACAACAA AGGGGCGGCT CAGTCCCGAG CCTCAGCATG GCTGGCAGCG 

3301 CGGCTGACAC ACACGTTCAA GCCCAGGACT GCCCGGGCGC AGGATCCAGG 

3351 CGCTGCCCGT GCGTTCAGTG ACTAATAAAA TGACCCTTAG GGCCAGGAAA 

3401 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAA {SEQ ID NO:l) 



FEATURES: 

5'UTR: 1-117 

Start Codon: 118 

Stop Codon: 2221 

3*UTR: 2224 




HOMOLOGOUS PROTEINS: 

Top BLAST Hits: 



CRA1335001098671800 
CRA| 33500109863922.4 
CRA|1000746201930 / 
CRAI18000004923413 
CRAI 18000004971635 
CRAI154000124061898 
CRAjl8000005144885 
CRAI18000004938377 
CRAI108000024647870 
-CRAil 00.0.6 8 2 3 2 2 7_9 9_/_ 



/altid=gi 111545741 /def =ref | NP_071325 . 1 | so. 
/altid=gi 1 11560117 /def-ref I NP_071623 . 1 1 su. 
altid-gi | 6746349 /def =emb I CAB69640 . 1 i (AJ223. 
/altid=gi | 4557539 /def =ref I NP_000103 . 1 1 sulf. 
/altid=gi | 627422 /def =pir | | A54808 diastrophi. 

/altid=gi 1 12054717 /def=emb | CAC20729 . 1 1 (Yl. 
/altid=gi I 6015035 /def=sp 1 070531 I DTD_RAT SUL. 
/altid=gi | 6681233 /def=ref I NP_031911 . 1 | dias. 

/altid=gi 1 12730580 /def=ref I XP_011158 . 1 1 so. 
altid=qi I 6755022 /def =ref | NP_035997 . 1 ! pendr. 



BLAST dbEST Hits: 

gi|10209038 /dataset=dbest /taxon=96.. 
gi I 7140527 /dataset=dbest /taxon=9606. 
gi I 5847932 /dataset=dbest /taxon=9606 
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EXPRESSION INFORMATION FOR MODULATORY USE: 

library source (from BLAST dbEST hits) : 
_ gi | 10209038 Lung 

U gi | 7140527 Lymph 

hU:. gi | 5847932 Kidney 

vst 

Tissue Screening Panels: 
jy." Human heart 

? ff Human Leukocyte 

ill' Thyroid 
h ±A Pituatary 
i~ Brain 

Fetal brain 

~~ Adrenal gland » 

\J- Testis / 

\si: Kidney 

Small intestine 
Pancreas 
Liver 
Lung 

Placenta 
Skeletal muscle 
Spleen 
Hela cells 



III 
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1 MDESPEPLQQ GRGPVPVRRQ RPAPRGLREM LKARLWCSCS CSVLCVRALV 

51 QDLLPATRWL RQYRPREYLA GDVMSGLVIG IILVPQAIAY SLLAGLQPIY 

101 SLYTSFFANL IYFLMGTSRH VSVGIFSLLC" LMVGQVVDRE LQLAGFD PSQ 

151 DGLQPGANSS TLNGSAAMLD CGRDCYAIRV ATALTLMTGL YQVLMGVLRL 

201 GFVSAYLSQP LLDGFAMGAS VTILTSQLKH LLGVRIPRHQ GPGMVVLTWL 

251 SLLRGAGQAN VCDWTSTVC LAVLLAAKEL SDRYRHRLRV PLPTELLVIV 

301 VATLVSHFGQ LHKRFGSSVA GDIPTGFMPP QVPEPRLMQR VALDAVALAL 

351 VAAAFSISLA EMFARSHGYS VRANQELLAV GCCNVLPAFL HCFATSAALA 

4 01 KSLVKTATGC RTQLSSVVSA TWLLVLLAL APLFHDLQRS VLAGVIVVSL 

451 RGALRKVWDL PRLWRMSPAD ALVWAGTAAT CMLVSTEAGL LAGVILSLLS 

501 LAGRTQRPRT ALLARIGDTA FYEDATEFEG LVPEPGVRVF RFGGPLYYAN 

551 KDFFLQSLYS LTGLDAGCMA ARRKEGGSET GVGEGGPAQG EDLGPVSTRA 

601 ALVPAAAGFH TVVIDCAPLL FLDAAGVSTL QDLRRDYGAL GISLLLACCS 

651 PPVRDILSRG GFLGEGPGDT AEEEQLFLSV HDAVQTARAR HRELEATDVH 

701 L {SEQ ID NO: 2) 

FEATURES : 

Functional domains and key regions: 

[1] PDOC00001 PS00001 ASN_GLYCOSYLATION 
N-glycosylation site 

Number of matches: 2 

1 158-161 NSST , 

2 163-166 NGSA • 

[2] PDOC00005 PS00005 PKC_PHOSPHO_SITE 
Protein kinase C phosphorylation site 

Number of matches: 7 

1 117-119 TSR 

• 2 281-283 SDR 

3 370-372 SVR 

4 449-451 SLR 

5 505-507 TQR 

6 597-599 STR 

7 686-688 TAR 

[3] PDOC00006 PS00006 CK2_PHOSPHO_SITE 
Casein kinase II phosphorylation site 

Number of matches: 7 

1 358-361 SLAE 

2 467-470 SPAD 

3 .526-529 TEFE 

4 562-565 TGLD 

5 629-632 TLQD 

6 670-673 TAEE 

•7 ' 679-682 SVHD - - . 

[4] PDOG00007 PS00007 TYR_PHOSPHO_SITE ' . 

Tyrosine kinase phosphorylation site 

515-522 RIGDTAFY 

[5] PDOC00008 PS00008 MYRISTYL 

N-myristoylation site - ■ 



Number of matches: 15 

1 76-81' GLVIGI 

• 2 152-157 GLQPGA 

3 156-161 GANSST 

4 218-223 GASVTI 

5 255-260 GAGQAN 
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[6] PDOC00012 PS00012 PHOSPHOPANTETHEINE 
^Phosphopantetheine^attachment site 



411-426 RTQLSSVVSATVVLLV 

[7] PDOC00870 PS01130 S U L FAT E_T RAN S P 
Sulfate transporters signature 

98-119 PIYSLYTSFFANLIYFLMGTSR 

Membrane spanning structure and domains : 
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BLAST Alignment to Top Hit: 

>CRA| 335001098671800 /altid=gi | 11545741 /def =ref I NP_071325 . 1 | solute 



carrier family 26 {sulfate transporter), member 1 [Homo 
sapiens] /org=Homo sapiens /taxon=9606 /dataset=nraa 
/length=701 



Length - 701 



Score = 1385 bits (3545), Expect =0.0 

Identities = 698/701 (99%), Positives = 698/701 (99%) 

Frame = +1 

Query: 1 MDESPEPLQQGRGPVPVRRQRPAPRGLREMLKARLWCSCSCSVLCVRALVQDLLPATRWL 180 

— — — MDESEEPLQQGRGP_V_PVRRQ RPAPRGLREMLKARLWCSCSCSVLCV RALVQDLLPATRWL 

Sbjct : 1 MDESPEPLQQGRGPVPVRRQRPAPRGLREMLKARLWCSCSCSVLCVRSLVQDLLPATRWL— 60 

Query: 181 RQYRPREYLAGDVMSGLVIGIILVPQAIAYSLLAGLQPIYSLYTSFFANLIYFLMGTSRH 360 

RQYRPREYLAGDVMSGLVIGIILVPQAIAYSLLAGLQPIYSLYTSFFANLIYFLMGTSRH 
Sbjct: 61 RQYRPREYLAGDVMSGLVIGIILVPQAIAYSLLAGLQPIYSLYTSFFANLIYFLMGTSRH 120 

Query: 361 VSVGIFSLLCLMVGQWDRELQLAGFDPSQDGLQPGANSSTLNGSAAMLDCGRDCYAIRV 540 

VSVGIFSLLCLMVGQWDRELQLAGFDPSQDGLQPGANSSTLNGSAAMLDCGRDCYAIRV 
Sbjct: 121 VSVGIFSLLCLMVGQVVDRELQLAGFDPSQDGLQPGANSSTLNGSAAMLDCGRDCYAIRV 180 

Query: 541 ATALTLMTGLYQVLMGVLRLGFVSAYLSQPLLDGFAMGASVTILTSQLKHLLGVRIPRHQ 720 

ATALTLMTGLYQVLMGVLRLGFVSAYLSQPLLDGFAMGASVT I LT SQLKHLLGVRI PRHQ 
Sbjct: 181 ATALTLMTGLYQVLMGVLRLGFVSAYLSQPLLDGFAMGASVTILTSQLKHLLGVRIPRHQ 240 

Query: 721 GPGMVVLTWLSLLRGAGQANVCDWTSTVCLAVLLAAKELSDRYRHRLRVPLPTELLVIV 900 

GPGMWLTWLSLLRGAGQANVCDVVTSTVCLAVLLAAKELSDRYRHRLRVPLPTELLVIV 
Sbjct: 241 GPGMWLTWLSLLRGAGQANVCDVVTSTVCLAVLLAAKELSDRYRHRLRVPLPTELLVIV 300 

Query: 901 VATLVSHFGQLHKRFGSSVAGDIPTGFMPPQVPEPRLMQRVALDAVALALVAAAFSISLA 1080 

VATLVSHFGQLHKRFGSSVAGDIPTGFMPPQVPEPRLMQRVALDAVALALVAAAFSISLA 
Sbjct: 301 VATLVSHFGQLHKRFGSSVAGDIPTGFMPPQVPEPRLMQRVALDAVALALVAAAFSISLA 360 

Query: 1081 EMFARSHGYSVRANQELLAVGCCNVLPAFLHCFATSAALAKSLVKTATGCRTQLSSVVSA 1260 

EMFARSHGYSVRANQELLAVGCCNVLPAFLHCFATSAALAKSLVKTATGCRTQLSSWSA 
Sbjct: 361 EMFARSHGYSVRANQELLAVGCCNVLPAFLHCFATSAALAKSLVKTATGCRTQLSSVVSA 420 

Query: 1261 TVVLLVLLALAPLFHDLQRSVLACVIVVSLRGALRKVWDLPRLWRMSPADALVWAGTAAT 1440 

TVVLLVLLALAPLFHDLQRSVLACVIWSLRGALRKVW PRLWRMS PADALVWAGT AAT 
Sbjct : 421 TVVLLVLLALAPLFHDLQRSVLACVIWSLRGALRKVWGFPRLWRMSPADALVWAGTAAT 480 

Query: 1441 CMLVSTEAGLLAGVILSLLSLAGRTQRPRTALLARIGDTAFYEDATEFEGLVPEPGVRVF 1620 

CMLVSTEAGLLAGVILSLLSLAGRTQRPRTALLARIGDTAFYEDATEFEGLVPEPGVRVF 
Sbjct : 4 81 CMLVSTEAGLLAGVILSLLSLAGRTQRPRTALLARIGDTAFYEDATEFEGLVPEPGVRVF 54 0 

Query: 1621 RFGGPLYYANKDFFLQSLYSLTGLDAGCMAARRKEGGSETGVGEGGPAQGEDLGPVSTRA 1800 

RFGGPLYYANKDFFLQSLYSLTGLDAGCMAARRKEGGSETGVGEGGPAQGEDLGPVSTRA 
Sbjct: 541 RFGGPLYYANKDFFLQSLYSLTGLDAGCMAARRKEGGSETGVGEGGPAQGEDLGPVSTRA 600 

Query: 1801 ALVPAAAGFHTVVIDCAPLLFLDAAGVSTLQDLRRDYGALGISLLLACCSPPVRDILSRG 1980 

ALVPAAAGFHTVVIDCAPLLFLDAAGVSTLQDLRRDYGALGISLLLACCSPPVRDILSRG 
Sbjct: 601 ALVPAAAGFHTWIDCAPLLFLDAAGVSTLQDLRRDYGALGISLLLACCSPPVRDILSRG 660 

Query: 1981 GFLGEGPGDTAEEEQLFLSVHDAVQTARARHRELEATDVHL 2103 

GFLGEGPGDTAEEEQLFLSVHDAVQTARARHRELEATD HL 
Sbjct: 661 GFLGEGPGDTAEEEQLFLSVHDAVQTARARHRELEATDAHL 701 (SEQ ID NO: 4) 




Hmmer search results (Pfam) : 

Model Description Score E-value. N 

PF00916 Sulfate transporter family 405.6 4.7e-118 

CE00008 E00008 GUANYLIN 8.6 0.016 

PF00497 Bacterial extracellular solute-binding prote 4.4 0.57 

Parsed for domains: 

Model Domain seq-f ■ seq-t hram-f hmm-t score E-value 

PF00497 1/1. 338 356 .. 1 27 [. 4.4 0.57 

CE00008 1/1 409 431 .. 1 24 [. 8.6 0.016 

PF00916 1/1 195 505 . . 1 328 [] 405.6 4.7e-118 
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1 NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
51 NNNNNNNNNN NNNTGGTGAA ACCCCGTCTC TACTAAAAAT ACAAAAAATT 
101 AGCCGGGCGT GGTGGCGGGT GCCTGTAGTC CCAGCTACTC GGGAGGCTGA 
151 GGCAGGAGAA TCACTTGAAC CCGGGAGACA GAGCTTGCAG TGAGCCGAGA 
201 TCATGCCACT GTACTCCAGC CTGGGCAACA GAGCGAAACT CCGTCTCAAA 
251 AAAAAAAAAA TTAGCCGGGC GCGGTGGCGG GCGCCTGTAG TCCCAGCTAC 
301 TCAGGAGGCT GAGGCAGGAG AATGGCGTGA ACCCAGGAGG CAGAGCTTCC 
351 AGTGAGCCGA GATCACACCA CTGCATTCCG GCCTGGGTGA CAGAGCAAGA 
401 CTCCGCCTCA AAAAAAAAAA AAGAAAAGGT GGGGGGCGTC TCACTATGTT 
4 51 GACCAGGCTG GTCTTGAACT GCTGGCCTTA AGCGATCCTC CTGTCTAGGC 
■ 501 CTCCCAAAGT GTTGGAATTA CAGGAGTGAA CCATCGTGCC TGGCTAATAA 
551 TTCCTTTTAA AAAGCAGCTT ACCCTTATTT TCACGTGTGG GCCTAATTTA 
601 GTTCACTTAA AAAAATCATT TATCTTCACC CCAGCCCTAT GAGGCAGGCA 
— 6 5 1-GTGGGGGTGG-TGGTGTGTGG-T AGAGGGGAG-GGCAGAGGAG-CCGTGAGGGf 
701 GACCAGGCGC TGTGGGTCGG TGCTGGGTCC AGTCAGACCA GGACTCCTGG 
751 CCAGTCACGG CACCTTGACC CCGGCAGTCC TCGCCCTGGG CGGTGAGCAC 
801 CACACACAGG GCTTACGCGA GCACACACGC ATATGCACGC ACCGGCAGCC 
851 TTGGGCTGAG CCGGCTGTCA GCCTCTGCCC TGCTCCAGCT TGGACCAGGC 
901 TGGCTCCTTG CAGGACCAGG AGGGTGTCCG GCGACTGGAC ACGGAGACCA 
951 AGCCTCCCTC AGCCCCGCCT GGGTTTGAAG GCTGCTGCAC TCGACCCCAG 
1001 ACCCCAGAGC TGAAGGTTTA CCTGTGCTCA GCCCCTGAGC CCCCGCCTCC 
1051 CGCTGGTCCC TAAGCCCCCC CGGCAGGGCC GCAGAGCCAC AGCTGCAGCC 
1101 GCTCCTGGGA GGCTGGGAGC TCCTCAGAGG CCCACACAGC TCTAACTACT 
1151 ACAAGCCCCT GATTACAGTT CAACTCCCGG ATCAGCCGAT CAGGTAACAT 
1201 GGCTGGAGAA ACCCGTGACT CAGCAATCTG TAGGTAAATA ATTGAACTAC 
1251 AGAGTCCAGG GCACAGACCA CTGCCTGCAG GTTGGCGCCA CCACCCCCAC 
1301 TCTCCCCGCT GCTCGCGGGA GCCAGAGGGC CCTGCGGTCC TCGGTGGTCT 
1351 TGCCAGCCCC TCGTCATCCC AGGGCCCTCC GCGCCTGTGA GGACTCCCTC 
14 01 AGGTAAGAAC CATCCTGGGC CCAGATCTCA GCTGCAGCAG AGGGGGGCGT 
1451 GGGAGCCGAG GCCAGAAATG CCCTGGACTC GTGGTTTCTT AGGGGCACCC 
1501 TCAGGCTCAA GGCAGGTGGC CCTACTGTCC CCATTCCACA CACCTGGACC 
1551 CCAGGGGCTT GGGGTGGGCT TCAGGGCATC CAGGGACCCA GTGTGGTGGG 
1601 GTCTTCCAGG GAAGGGGACA CAACTCTTGC AATGTTGCCT GAGGGCCAGG 
1651 ACCCCCGCTC TGTGCCCCAG GGGTGCTGTG CCCAGCCTGC ATGTGTCAAC 
1701 CTACCAGGCT GGGCTCACTG CCCCAACACA CCCGCCAGGA GACTGGAGCT 
1751 CGCACACCCT GGGCCAGCGT GCAAACAGCA GGCTCAGCCC AGGCTCCAGG 
1801 GTGTCCTGGG CACCTGGTGT CCTGGGAGCA AAGTCTTTGC CTAACGTCGC 
1851 TGAGAAGAAT GTTTAAAGTG AAAGTACATT GGAGTCTGCA AACAGGACAG 
1901 ACCCGAGGCC TCACGTGGGA CCAGTCAGGC CTCTAAGCAC CGCCTCCCTA 
1951 ACGCCACGGT GTTTTCCGAG ATCAAGGGAA AGGTCAGGTG CCCTTCCGGC 
2001 TCTGCCGGCC CAGGGTGACT GTGTGCAGCG GGGTGGGCCC TCTCGGTGCT 
2051 GCCTCGGGAC AGTGTGTCAT GGCCGTTCCA CAGTGAGCTG GTGCAGCCTG 
2101 GGAAAAAGGG CGCCTCACGT CCCAGAACTG TCTGGGCAGG GG AG AC AG AC 
2151 GCCAGTCACC CTCCTCCCCT CCCAGCTGGC CCTGATGGGG CCCCCGTCCA 
2201 GGCATATTCT CAGAATTCTG TCCCAAGTCC AGGCGGATGG GCTAGGCTAG 
2251 TGTCTGAGTG CTGCTCCCCC AGCAGACTTG GGGTCCCAGT ACCCACAAAG 
2301 CTTGGCAGGG ACATAGGAGG CCTCTTCCTG AGACTTCCGC CAGCCCCAGG 
2351 ACCCACAGGG CAGGTGACAG AGGGGTGGGT GGAGGTGTCT CCAGGAGAGC 
2401 AGGCGATGGT TTGGATGGGG GAGGGAGGGC TCTGGTGTGG GCATGGGGTG 
2451 GACAGCAGGA CCGTTTGCCA ACCTGGGGAG CCAGGGAGGT GGACACGGAG 
2501 CAGCTGGACT CAGGCTTGCC TGCACCTGTG TCCAGTGACT GTGACATTCT 
2551 GACGGTAGGC ACATGTGCGT GGTGGCAGCC CAGCCTGTTC CTGCCCCGTT 
2601 GGGGAGGTGA GCTTCAGGAG GCTACAGGGT GGTTTTCAGC CAGGAACCGC 
2651 AGAGCCAATA GGCCGGAGCT GAGCCTGGAC AGGGTGCCGC CACGCCGCCC 
2701 CTCAGCACTG CTGGCCTCAG CACACCCCAT GG CATGGGCT TGGTGTCGTA 
27 51 ATCCCATCTC ACCCCACGAT GGATTCTGGA TCCAGCAGGG CCCAGCGTCC 
2801 ATCCATACCG GGCAGGGGGC TGGGGCCCGC GCTGCCAGGA GAAGGCCCAG 
2851 CACCAATCCC CGGCCCTGGG TGGGCGAGGG GTCCGCCCCA AGGGGCCCGT 
2901 TGCTGCCGGG GACCTTGTCG TTTGGCCCTG GATCCGGGGG CTCCTGTGAC 
2951 GATGCCCTCT TCTCGGCCGC AGGTCGGCCA CGGGACCTGA CGCAACAGGA 
3001 TGGACGAGTC CCCTGAGCCT CTGCAGCAGG GCAGAGGGCC GGTGCCGGTC 
3051 CGACGGCAGC GCCCAGCACC CCGGGGTCTG CGTGAGATGC TGAAGGCCAG 
3101 GCTGTGGTGC AGCTGCTCGT GCAGTGTGCT GTGCGTCCGG GCGCTGGTGC 
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3151 AGGACCTGCT CCCCGCCACG CGCTGGCTGC GTCAGTACCG CCCGCGGGAG 
3201 TACCTGGCAG GCGACGTCAT GTCTGGGCTG GTCATCGGCA TCATCCTGGT 
3251 GCCGCAGGCC ATCGCCTACT CATTGCTGGC CGGGCTGCAG CCCATCTACA 
3301 GCCTCTATAC GTCCTTCTTC GCCAACCTCA TCTACTTCCT CATGGGCACC 
3351 TCACGGCATG TCTCCGTGGG CATCTTCAGC CTGCTTTGCC TCATGGTGGG 
3401 GCAGGTGGTG GACCGGGAGC TCCAGCTGGC CGGCTTTGAC CCCTCCCAGG 
3451 ACGGCCTGCA GCCCGGAGCC AACAGCAGCA CCCTCAACGG CTCGGCTGCC 
3501 ATGCTGGACT GCGGGCGTGA CTGCTACGCC ATCCGTGTCG CCACCGCCCT 
3551 CACGCTGATG ACCGGGCTTT ACCAGGTGAG GAGCCCTGCT TGGGCACAGG 
3601 GAGGGGCCCA GGGCACCCCC CCTTAGGTTT TGGCCATCCA CGAGGGCAAG 
3651 GCTGGGGGCA AGCACAGGGT TGGCAGAGGA GGTGCTGGCC CAAGACAGCA 
3701 AGGCTTGGGC AGAGCTGGGG CGTGCCGGGG CATCCCAGGG CGAGGCACCG 
3751 ACGCGGAGAG GCTGTGGATG CAGGAGGGGA GGGGCACGGG GAGCCAGTCC 
_3_8.01_GGT.GGGCCAT~ GGCCTTGGTG- GGGAGGAGGA-GGGGAGGTGT-GGGTGTGGCT 
3851 CAGTGGTGCT GGACTGAGGC CATGTGGCCT CCCAGGCCTT CTGTCCTAGG 
3901 TGGAGTGGGG GATGGCCTCC CCACCCCCGA AGGTCTCCTG CCTTGGCCTG 
3951 TCCACCTTGG CCCCCGTTGG CTCCACATCT GCATGGGGGG CAGTGGGCAC 
4001 CATGTGTAGG AAGCAGCAGG AAGGGGTTGC CTTCTGATAC CAGAGGTCTT 
4 051 AATTCTGAAA TAAAACGGGC TGCTGCACGT GACAAGGGTT AGACGTGTCT 
4101 ATGGCCAGCT GTGTGCACGT GTGATGCTCA CGTGGATGTC ACAGTTGTCT 
4151 GCGGGCATGA GCACGCGTGG AACCAGAACT CAGGCCCGTG TGAGGAGTCT 
4 201 GGTTTGGAAC ACACGGGGCC GCAACACAGA ATTGTCAGGT CCTGTGCCGT 
4251 GACCACCACC CCTCGGGCCA TGCCAGGTGC TGGTGAGGGG CAGGTGGCTC 
4 301 CCGCCAGGCG CCTGCTGGCC TGACCGCACT CCGTCCACAG GTCCTCATGG 
4 351 GCGTCCTCCG GCTGGGCTTC GTGTCCGCCT ACCTCTCACA GCCACTGCTC 
4401 GATGGCTTTG CCATGGGGGC CTCCGTGACC ATCCTGACCT CGCAGCTCAA 
4451 ACACCTGCTG GGCGTGCGGA TCCCGCGGCA CCAGGGGCCC GGCATGGTGG 
4 501 TCCTCACATG GCTGAGCCTG CTGCGCGGCG CCGGGCAGGC CAACGTGTGC 
4551 GACGTGGTCA CCAGCACGGT GTGCCTGGCG GTGCTGCTAG CCGCGAAGGA 
4 601 GCTCTCAGAC CGCTACCGAC ACCGCCTGAG GGTGCCGCTG CCCACGGAGC 
4 651 TGCTGGTCAT CGTGGTGGCC ACACTCGTGT CGCACTTCGG GCAGCTCCAC 
4701 AAGCGCTTTG GCTCGAGCGT GGCTGGCGAC ATCCCCACGG GTTTCATGCC 
4751 CCCTCAGGTC CCAGAGCCCA GGCTGATGCA GCGTGTGGCT TTGGATGCCG 
4 801 TGGCCCTGGC CCTCGTGGCT GCCGCCTTCT CCATCTCGCT GGCGGAGATG 
4851 TTCGCCCGCA GTCACGGCTA CTCTGTGCGT GCCAACGAGG AGCTGCTGGC 
4 901 TGTGGGCTGC TGCAACGTGC TACCCGCCTT CCTCCACTGC TTCGCCACCA 
4 951 GCGCCGCCCT GGCCAAGAGC CTGGTGAAGA CAGCCACTGG CTGCCGGACA 
5001 CAGCTGTCCA GCGTGGTCAG CGCCACCGTG GTGCTGCTGG TGCTGCTGGC 
5051 GCTGGCACCG CTGTTCCACG ACCTACAGCG AAGCGTGCTG GCCTGCGTCA 
5101 TCGTGGTCAG CCTGCGGGGG GCCCTGCGCA AGGTGTGGGA CCTCCCGCGG 
5151 CTGTGGCGGA TGAGCCCGGC TGACGCGCTG GTCTGGGCAG GCACCGCGGC 
5201 CACCTGTATG CTGGTCAGCA CAGAGGCCGG GCTGCTGGCT GGCGTCATCC 
5251 TCTCGCTGCT CAGCCTGGCC GGCCGCACCC AACGCCCACG CACCGCCCTG 
5301 CTGGCCCGCA TCGGGGACAC GGCCTTCTAC GAGGATGCCA CAGAGTTCGA 
5351 GGGCCTCGTC CCTGAGCCCG GCGTGCGGGT GTTCCGCTTT GGGGGGCCGC 
5401 TGTACTATGC CAACAAGGAC TTCTTCCTGC GGTCACTCTA CAGCCTCACG 
54 51 GGGCTGGACG CAGGGTGCAT GGCTGCCAGG AGGAAGGAGG GGGGCTCAGA 
5501 GACGGGGGTC GGTGAGGGAG GCCCTGCCCA GGGCGAGGAC CTGGGCCCGG 
5551 TTAGCACCAG GGCT.GC'GCTG GTGCCCGCAG CGGCCGGCTT CCACACAGTG 
5601 GTCATCGACT GCGCCCCGCT GCTGTTCCTA GACGCAGCTG GTGTGAGCAC 
5651 GCTGCAGGAC CTGCGCCGAG ACTACGGGGC CCTGGGCATC AGCCTGCTGC 
5701 TAGCCTGCTG CAGCCCGCCT GTGAGAGACA TTCTGAGCAG AGGAGGCTTC 
5751 CTCGGGGAGG GCCCCGGGGA CACGGCTGAG GAGGAGCAGC TGTTCCTCAG 
5801 TGTGCACGAT GCCGTGCAGA CAGCACGAGC CCGCCACAGG GAGCTGGAGG 
5851 CCACCGATGC CCATCTGTAG CAGGGCCAGG CCTGCCCAGC AGCCTCTGCT 
5901 CCCTCCTGGG GACCCACAGC AGACGTCTGC AAGCCACTGC TGAGACCCTT 
5951 CCCAGGGAGG AGCCACCCAA GAGCTGCACT CTTGTGCCAC AGCTGCCCTG 
6001 GGGAAACCGG GGAACCCCAA CTGGGAAAGG AGGCCCTCTG ATCACACGCA 
6051 GGACCCAAAC ACTCAGAAAT CAAGAACCTC TGCCTCCGAG ACAGGCTGGC 
6101 CCACAGTGCT GGCTGGGCCC CAATGCACCG TCCCTCAGCT CAGAAGGGAT 
6151 GGGCCTGACC TGACGCTCAG GGTTGACATC TTATTTGAAC AAGGGTCCCC 
6201 CGCCATCATG CAGCCTCCAA GGTGCCAAGA GGACTCCCTA TGCCCAGGCC 
6251 TGCCCGGTGC CCACCCTGCT GGTAGGAGCC AGCGGCTCTG GCCAAGTGCA 




IS' 



6301 
6351 
6401 
6451 
6501 
6551 
6601 
6651 
6701 
6751 
6801 
6851 
-690-1- 
6951 
7001 
7051 
7101 
7151 
7201 
7251 
7301 
7351 
7401 
7451 
7501 
7551 
7601 
7651 
7701 
7751 
7801 
7851 
7901 
7951 
8001 
8051 
8101 
8151 
8201 
8251 
8301 
8351 
8401 
8451 
8501 
8551 
8601 
8651 
8701 
8751 
8801 
8851 



CGAGGGTCTC 

ACCTCGTGCC 

CAGGGTGCTG 

CCTACCCAGC 

AGGAGCCAGG 

GAAGCCCACC 

CCCAGCACAG 

GAGGTGGGGG 

ATCTGTGCCT 

CACCTGGTGG 

GCCGCGGTGA 

AGCTGAGGAC 

-ACAACAAAGG- 

CTGACACACA 

TGCCCGTGCG 

GGGAGGTCCC 

AACGCCAGCC 

AGGTGGAAGG 

CTCCCAGGGG 

GGCCTGGGCT 

GTGGAGGACC 

GGCCCCAGAG 

AGGGGGCGCC 

GAG CG CfC AC 

GCGCGCGGGC 

ACCGGGGGTG 

GGGGCGCAGG 

GGCTGCACTG 

TGTGACCGCC 

GGGAGGGGTT 

GGGCTTGGAG 

GATGACTCGG 

AGTTGAGCCA 

CAGAGTCTTG 

TCACTGCAAC 

CCTGAGTAAA 

ACACTGGGCC 

AGACTGAAAA 

TTAAAACAAT 

GGGACTATAA 

AATGTGCAAA 

TTTTGGTATC 

ACAACTGTAG 

ACATTGTATA 

CACCTGCAAT 

TTGAGGCCTG 

TGTACAAAAA 

CAGCTACCCA 

GGCTGAGACG 

ATGAGACCCC 

CAAAACGAAA 

TTAGAGTTAC 



TGTGTTTCCA 
TCCCCCTCAC 
TGACCGATGA 
CTGGCCAGCA 
TGCGGCGCAG 
GTGCCCCATG 
CCAGCAAGGA 
AACAAGGGGT 
CTGTAAGGGG 
TGACAAGCTC 
GGGACGGCGC 
GTACTGGTCA 
-GGCGGCTCAG 
CGTTCAAGCC 
TTCAGTGACT 
ATCTTCATGG 
CTGGCCCTGG 
AATGGGACAG 
AGACTCCTGG 
GCAGGGGGCG 
CACCCACAAA 
GAGGGAGCTC 
CGAGCTCTCC 
CAGAAGCCTG 
CGCGTCCACC 
CGGCCAGGAG 
GGACGCATGG 
CCGGTTCCGC 
GCCGCGGGGC 
GGGTGGCCTC 
CCCCGCTTCC 
AAAGCGCTCA 
TTACTGTCTT 
CTTTGTCGCC 
CTCCATCTCC 
GCGTGCGCTT 
CTCCTTACAC 
TATTGTTTTA 
ACAGTATAAC 
GTAATCTTGA 
TACTGTGCCA 
TGGGAGGGTC 
TACATGTGTA 
AAAATAATAA 
CCCAGCACTT 
GAGTTTGAAA 
AAAAATTTAG 
CGAAATTGAG 
GGCCATGATC 
ATCTCTAAAA 
ATACTGGACA 
AGTGTTCT 



GAAGGCCCCA CACACCCAAG 
AGGGTGGCCA CCTGCACCAG 
GACCTCAGCT CAGCCCTCAG 
GACACACACA GGGATGCTCA 
CCAACCCTGA GCCTGCAGGG 
CAGGGGCTCC CTCCAGCACA 
CACGCTCTCC CCAACAGGGT 
CTTCCGAGCA GCCCCCAGCC 
CTCTGGGACG CCCAGACCCT 
CAGCAGCCAG TGGGTCCGGA 
CCACATAGGC GAGGTTGAGC 
GCCTGGCTGT GTGGCAGCGG 

JTC_CCG AGCCT CAGCA TGGCT 
CAGGACTGCC CGGGCGCAGG" 
AATAAAATGA CCCTTAGGGC 
GGAACGGCAG CAGCAGTAAG 
CCCTGCCAGG AAGGCGGGTA 
GCAGGCCAGG TCCCGCTGCA 
TTTACCTCAA AGAGCAGGAT 
GCCCAGGCTC ACGCCCCGGC 
CACGGCGGGG GGCGGGCCCG 
CGGTCTCTGA AGCTCTCACA 
CCGTGCGGCC AGGGGGTCCC 
TGCTCCTCCA GAAGCGCCGC 
TGCACCAGGT GCGGGGCCTC 
CGAGGCCAGG AGCGCCAGCA 
CCACGCGTGC TCGGGGACTG 
CTCCGGGTCG GAGTCTGGGC 
GGGGCCTTGG TGAGGGGGCG 
GGGGAGCCTC GGGGAGCCGG 
TTGCGGGCCT CAGGGGCTGC 
GAAGAACGCT TCGCCCGTTG 
GTTTTTCTCT GTTTTTGTGT 
CAGGCTGAGG TGCAGTGGCG 
GGGGCTTCAG CGATTTTCTC 
TAGCAGGAAG GAGAATTACC 
TTGGCTTCAG ATCCATGGAT 
AAGCCAAAGC AATACGAAAT 
AGCTATTTAC AGAGCATTTA 
TTTAAACTAC ACAGTAGGAT 
TTTTATATCA AGTACTTGAG 
CTGGAACCAA TACCCCGAGG 
GTCCATGTAT GCATGTGTGA 
TGGAAAGAAC AGGCTTGGTG 
TGGAATTGCA GGCCAACACG 
TCGGCCTGGG AGATGTACCA 
CCAGATGCGA TGGTATATGC 
GTGGGAGATT GCTTGAGCTT 
ACACCACTAC ATTCCAGCCT 
AAAGAAAAGA AAAAAAGAAC 
ATAATCCTCT CTAAGTTGGG 

(SEQ ID NO:3) 



TGCCCCTCAC 

CGTCAGGGCC 

GTGCAGTGGC 

CGGGTGCACC 

AGACCTGCAG 

CAGCCCTCAC 

GCTTCGGCGG 

CTCCCCTCCC 

GCCCGCCGCC 

CCTGCTTGAT 

TGCTGGTCCC 

GGGGCTGGGG 

GGCAGCGCGG 

"ATCCAGGCGC- 

CAGGAATGTG 

ACGAGGGGCC 

CCTCAGCTCT 

GGGCCGTCCA 

CCCGGGCATC 

GCCCACTCAG 

GGAGAGCCAG 

GTGCGCAGTC 

GGAGGCCGCG ' 

AGGGGCCACA 

GGCCGGGGCC 

GCGCTGCGCG 

CGGGGCTTCG 

GCGCACCCCA 

ATGGCCGGGT 

GAGCACGGCA 

TCTGAGGACC 

GTGCTATGTG 

GTTTTTGAGA 

CGATCTCAGC 

ACCCCAGCCT 

CCAGAAGAGC 

TCAACCAAGC 

AATACATATT 

CATTGTTTTA 

GTGCGTAGGT 

CACCTGCAAA 

ATACCATGGG 

ATCCAAGCAA 

CGGTGGCTCA 

GGAGGATCAC 

AGACCCCATC 

CTGTGAGGCC* 

AGGAGTTCAA 

GGTTGACAAA 

AGTCTACTAA 

AGAAGGATAA 



FEATURES ; 

Start : 
Exon : 
Intron : 
Exon: 
Stop: 



3000 

3000-3575 
3576-4340 
4341-5867 
5868 



FIGURE 3, page 3 of 4 



One non-coding Exon in the 5' UTR: 

{query = cDNA sequence; subject = genomic sequence) 
Score = 174 bits (88), Expect = 7e-46 
Identities = 91/92 (98%) 
Strand = Plus/Plus 



Query: 1 tcgcgggagccagagggccctgcggtcctcggtggtcttgccagcccctcctcatcccag 60 

I I I I I I I i I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I M ! I I I I I I I I t I I I 
Sbjct: 1313 tcgcgggagccagagggccctgcggtcctcggtggtcttgccagcccctcgtcatcccag 137 



Query: 61 ggccctccgcgcctgtgaggactccctcaggt 92 

— — rrrrirrrrrrrrrrrrrrrrrrrrrrrrrri ~ 

Sbjct: 1373 ggccctccgcgcctgtgaggactccctcaggt 1404 

CHROMOSOME MAP POSITION: 

Chromosome 4 

ALLELIC VARIANTS : 

C/G nucleotide polymorphism at genomic position 1363 (in non-coding exon; se 
cDNA/genomic sequence alignment above for the non-coding exon) 

V/A amino acid polymorphism at protein position 699 
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